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Summary 

This  research  project  was  aimed  at  investigating  the  applicability  of  a  new  filter  design 
technique  invented  by  Dr.  Beylkin  at  University  of  Colorado  to  develop  a  design 
automation  tool  for  implementing  filters  in  adaptive  computing  systems  (ACS)  using 
FPGA  devices.  Angeles  Design  Systems  provided  the  design  automation  environment 
and  USC/ISI  provided  the  adaptive  computing  testbed  for  testing  the  filter  designs. 
USC/ISI  also  provided  overall  project  management. 

In  the  first  year,  Angeles  carried  out  several  experiments  with  filters  designed  by 
Colorado  to  evaluate  the  cost  of  these  filters  in  ACS  devices.  USC/ISI  provided  the  basis 
for  the  cost  calculations.  Angeles  used  their  automated  optimization  tools  to  calculate  the 
cost  of  the  hardware- optimized  designs.  It  was  determined  that  the  filters  designed  with 
the  existing  Colorado  techniques  (Appendix  A)  did  not  reduce  the  cost  compared  to 
conventional  filter  designs.  The  results  of  the  Angeles  experiments  indicated  that  the 
Colorado  technique  was  reducing  the  computational  cost  significantly  but  increased  the 
memory  cost  in  ACS  systems.  This  is  because  in  ACS  devices,  USC/ISI  research 
indicated  that  memory  cost  dominates. 

In  response  to  the  above  results,  in  the  second  year  Colorado  further  developed  the  theory 
and  created  the  “sub -sampling”  factored  FIR  filter  design  technique  (Appendix  B). 
Experiments  conducted  by  Angeles  using  their  design  automation  environment 
(DSPCanvas)  indicated  that  this  new  technique  yields  the  same  computational  cost 
reductions  as  the  original  technique  and  dramatically  lowered  the  memory  costs.  As  an 
example,  a  stringent  filter  required  in  radar  system  at  MIT  Lincoln  Labs  was  used  as 
benchmark.  It  was  found  that  the  new  technique  developed  under  this  program  yielded  a 
70%  overall  cost  reduction  for  the  radar  filter  compared  to  conventional  filter  design 
techniques  (see  results  section  in  this  report).  USC/ISI  researched  applications  and 
identified  the  Radar  filter  for  benchmarking. 

In  the  third  year,  Colorado  provided  support  for  integration  of  their  filter  design  software 
with  Angeles  design  automation  environment.  Angeles  developed  the  ACS  (LPGA) 
design  generator  (in  VHDL)  to  translate  the  filter  design  to  actual  hardware 
implementation.  Angeles  experimented  with  different  ACS  based  architectures  to 
determine  the  lowest  cost  memory  implementation.  USC/ISI  carried  out  testing  of  the 
hardware  designs  on  their  ACS  platform  to  validate  the  design  automation  tools. 
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Introduction 

Filters  form  a  substantial  part  of  many  defense  signal- processing  systems  such  as  radar 
receivers.  Due  to  changing  system  requirements  (depending  on  the  theater  of  operation) 
the  filter  designs  need  to  be  frequently  changed.  Manufacturing  new  hardware  for  every 
new  requirement  is  an  expensive  proposition  and  also  limits  the  military’s  agility  to 
change  the  system  in  the  field.  ACS  technology  developed  at  USC/ISI  allows  the  same 
hardware  to  be  reprogrammed  in  the  field  for  different  filter  functions'  thus  eliminating 
new  hardware  costs  and  delays  in  deployment.  ACS  hardware,  however,  requires  filter 
design  techniques  that  reduce  the  cost  (size)  of  the  filter  computations  and  memory 
requirements  to  allow  implementation  in  ACS  devices.  Angeles  Design  Systems  has 
commercial  DSP  design  tools  that  allow  optimization  of  filters  for  target  hardware. 

To  leverage  ACS  filter  implementation  for  DoD  applications  USC/ISI,  Colorado  and 
Angeles  partnered  on  this  project  to  develop  and  implement  a  filter  design  tool  that 
allows  the  resulting  filter  hardware  to  be  implemented  in  ACS  devices. 


Methods,  Assumptions  and  Procedures 

Methods : 

1 .  Sub  -  sampling  ( Appendix  A)) , 

2.  System  Solve  for  cost  analysis  and  hardware  optimization  (ref  web  site), 

3.  USC/ISI  ACS  platform 

Assumptions : 

1 .  Target  technology  is  ACS  devices 

2.  Filter  designs  requirements  can  be  expressed  by  frequency  response  and  SNR 
specifications 

3.  System  stability  requires  polynomial  filters 
Procedures : 

1 .  Identified  benchmark 

2.  Experimented  with  conventional  technique  and  proposed  techniques 

3.  Evaluated  results  using  actual  ACS  hardware  parameters 

4.  Developed  new  techniques  based  on  experimental  results  to  achieve  goal  of  cost 
reduction,  while  maintaining  automation  of  design  techniques. 
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Results  and  Discussion 


The  benchmark  radar  receiver  filter  represents  extremely  stringent  requirement,  which 
usually  result  in  high  precision  computational  hardware  thus  driving  up  cost. 

The  new  techniques  developed  under  this  program  not  only  provide  70%  cost  reduction 
compared  to  conventional  techniques  but  also  better  performance  Signal  to  Noise  Ratio 
(SNR).  This  is  due  to  reduced  precision  required  in  the  proposed  filter  fimction  compared 
to  conventional  filters  to  achieve  the  same  performance. 

A  key  reason  for  the  success  of  this  project  was  the  ability  of  USC/ISI  and  Angeles  to 
utilize  real  world  hardware  knowledge  and  design  tools  to  pinpoint  that  source  of  costs  in 
the  filter  design  and  Colorado’  ability  to  develop  a  technique  specifically  minimizing 
these  costs  while  creating  a  fully  automated  design  procedure. 

The  radar  front-end  filter  (obtained  from  MIT  Lincoln  Labs)  is  used  for  converting  real  ADC 
data  into  in-phase  and  quadrature  radar  samples  in  the  IF  stage  of  the  receiver.  This  requires  I 
and  Q  channel  filters  for  conversion  from  IF  to  baseband. 

The  filter  specs  to  accomplish  this  are  given  below: 

•Input  Signal  bandwidth:  250  kHz  centered  at  2.5  MHz 

•IF  Input  Sample  Rate  from  ADC:  10  MHz  sampling. 

•Output  baseband  sample  rate:  625  kHz  (decimation  by  8  in  each  filter) 

•Passband  cut-off  (for  each  filter):  125  kHz 

•Stopband  edge  (for  each  filter):  312.5  kHz 

•Stopband  attenuation:  90  dB 

The  same  filter  was  desigped  using  3  different  algorithms,  with  responses  and  costs  as  shown 
below.  Note  that  the  sub- sampled  factored  FIR  filter  developed  on  this  project  costs  33%  of 
the  conventional  filter  design  currently  used  in  the  radar  front  end,  while  providing  superior 
attenuation  and  SNR  performance.  The  design  and  implementation  of  this  filter  in  ACS 
devices  (FPGAs)  has  been  fuUy  automated  as  described  in  Appendix  C. 
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150*^  order  conventional  FIR  filter  frequency  response 
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Frequency  response  of  factored  FIR  filter 


Frequency  response  of  sub-sampled  factored  FIR  filter 
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FIR  Filter  Cost/Performance; 


Design 

Attenuatio 

n 

SNR 

Coefficien 
t  Precision 

Datapath 

Precisio 

n 

Cost 

Optimized 

-90.4471 

91.1 

dB 

<20,  20, 
7> 

<20,  20> 

9608. 
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Factored  FIR  Cost/Performance: 


Cost 

Attenuatio 

n 

SNR 

Coefficien 
t  Precision 

Datapath 

Precisio 

n 

Taps 

6916.2 

-99.1365 

54.2 

7  dB 

FIR 

<30,  30, 

8> 

Factored 

fir 

<40,22,5> 

<40,  40> 

80 

Sub-sampled  factored  FIR  Cost/Performance: 


Cost 

Attenua 

tion 

SNR 

Coefficient  Precision 

Datapath 

Precision 

Factors 

3154.55 

-118.54 

116.4 

dB 

Fac  fir  4:  25,  8, 4 

Fir:  36,  31,  8 

Fac  fir  0:  8,  4,  1 

Fac  fir  1:  4,  2,  1 

Fac  fir  2:  4,  2,  1 

35,33 

Taps  0=8 
Taps  1=8 
Taps  2=8 
Taps  3=8 
Taps  4  =  35 
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Conclusions 

In  conclusion,  the  research  objectives  of  developing  an  automated  filter  design  technique 
for  low-cost  ACS  implementation,  was  achieved.  Colorado  developed  a  new  theory  as 
well  as  computer  program.  Angeles  integrated  the  Colorado  design  program  in  to  their 
commercial  design  environment  and  developed  cost  calculation  tools  and  hardware 
generation  tools  specifically  for  ACS  implementation.  USC/ISI  provided  the  ACS 
expertise  and  knowledge  base  and  carried  out  the  final  demonstration  of  the  project. 

A  key  impact  of  this  project  was  the  combination  of  theory  (Colorado),  practical  design 
tools  (Angeles)  and  actual  hardware  prototyping  (ISI).  Based  on  the  tools  and  hardware 
feedback,  the  original  theory  was  modified  to  develop  a  completely  new  type  of  filter 
structure,  which  is  well  suited  for  ACS  implementation  and  provides  substantial 
advantages  over  conventional  filter  structures. 

In  the  above  project  the  filter  theory  was  advanced  to  ensure  reduction  of  the  memory 
requirements  in  the  ACS  device.  A  future  project  could  aim  at  further  advancing  the  filter 
theory  so  that  the  coefficients  are  optimized  to  be  closer  to  values  that  reduce  the  ACS 
hardware  requirements. 

Other  techniques  to  reduce  filter  complexity  generically  reduce  the  total  computational 
needs.  This  may  not  however  reduce  actual  hardware  as  established  in  the  above  project. 
The  uniqueness  of  this  project  lies  in  the  fact  that  the  actual  factors  affecting  ACS 
hardware  complexity  were  first  identified  and  then  the  filter  theory  was  developed  to 
minimize  these  factors.  This  has  led  to  the  most  powerful  ACS  filter  design  approach. 


References 

See  appendices  A,  B,  and  C. 
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Appendix  A 


A  linear  system  and  explicit  solutions 

for  approximate  linear  phase  filters 

Uriv.  of  Q>lorAlo>  at  Bpuidtr  Technical.  Report  APTM  41 5,  NovantHS"  1999 

Lucae  Mona^n' 

□epartjnent  of  ApplU:d  Mathematics 
U ni\-er:itj''  of  CoLoraiio  at  Boulder 
Boulder,  GO  80ew-0f;2fe.  USA 

(50SI  492-4273, fa’s  (30(51 492-4066c e-mail;  lucaa.aiwiizoiifcolorado.adu 
COICS  Cat£;g,orv;  SP  i-STTM  Thcorv^  and  methods 

Abditnet 

Hilx'nk  with  cxKt  it  i>ppn*dnutr  linrar  phnr  rc?(p»ofmr  ATtwMl  the  iTi^  hare  Ivm 
fiT  nuiw  yxvTi  ami  appbrd  ti«  a  Nvirt^  pn^^rniA.  tn  pracinr  (w  may  need  filtrr^  ?hatxdying 
i^thcr  pnyvrticri  Urddr!>  lincw  phanr.  Thnr  adfditk^Ml  pix^pcrt^  may  prr\'mi  oad  Itmcnr  phanc 
and  imptwe  irntridii^  m  thr  I'ldcr  v*f  apprc^ximatii^.  9«tor  the  irdcT  linrar  phanr  appix^vima- 
bim  dcpn^^i  v«n  thr  nunvbcr  v*f  vanrihin^  layK^r  ATfiicimb  in  ths*  ph^c  a  n^imd  jsriv^ 

thric  ad^datii^  hkve  bem  th^^ught  impiwr  m^iHinrar  the*  filer  aK‘fhfCxnit?'i. 

Et  it  •'^nvn  here  th*t  any  irUrr  (>f  linoi  r  phric  appn^namatii^  r»  C({iara|mt  b'  a  mrt  t>f  linear 
cixiditiiTci  vn  thr  a^fifiorntri  thr  filtrr.  Thr  linear  i^irm  r«  Urfmrd  fn^m  a  set  («n 

th^  filtrr  frequmey  rcspc^c^  and  these  ci^nditi^xv^  arc  name  wh^thrr  Uv  filter  ci  FTR^  UR.  (ra- 
tK^l  4T  m^ratiixMlb  aiMK^  i>r  digital  In  thr  l>ttrr  onCp  the  a^rriripiind  \y  ranedting 

nhiited  t«dd  nuvnents  thr  filtrr  ciT’flkicnts.  The  shilit  n  an  arbatrary  real  number  whi4t  cqiu|!« 
the  DC  giT^^up  deiiy. 

SimultancwN  phane  and  amplitiadr  appm^omataim  can  be  alwi  shmvn  k'  lad  k'  a  lovar  vvTk^ 
trm.  FHt  diptal  fihrrv  the  flatness  the  amplitiide  respivn^e  aiv<iind  the  e^rigin  h  equiralent  k^ 
vanbhinig  dtdfied  even  nii*mmts  i«f  thr  axf  fxaenls. 

DvpIidtexprcs.aL^  fi«r  digital  TIK  fillers  with  i^timalphine  appnndnatA^  tr  with  e^timal 
^dmaltaneiviamplttideami  phase  apprt«dmatk^e>f  an  ideal  rcTfpiwie  are  derh-xd.  IhtH^hcanes^ 
mxw^timal  appiXT^imatkxis  are  c^-pretied  m  Itncar  cvmb«natiimn  cV  thr  iyUmal  filters.  By  idenr 
tifyin^  an  ptxvabie  ex¥e  v^biainrt  the  tunic  tuildini^  present  in  a  varrtv  v(  filter 

designs, 

Thwnaciirii^'iWFar^uil'  ]Ufyar£Jbt-lMJCPAi7aricP3LhLC^LHIl5|. 
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I  [ntroduction 


Biicjs  with  linear  ph.aficrcsfHTiscareof  o^naileraMe  importance  for  ax'arieU''  of  applications.  Un/or- 
lunateLv.  otha'dffiirat'lepropHnUs  of  tltc  rilttrsa^uld  bcincompatiMc  >vkKttte  linear  pltase  property. 
For  example,  except  for  the  Haar  filter,  thov  areno  FIR  perfect  reconstnuiion  fillers  tvith  oatt  linear 
pliasc  [  1 1,  Zl.  et’en  though,  it  is  still  possible  to  <lcagn  exampks  if  one  asks  for  nearly  linear  phase  [b]. 
These  filters  are  namol  coifteis  [3|  and.  as  show-n  in  [3l.  their  construction  is  greatly  ampLifteii  by  an 
appropriate  linear  diange  of  variables  Inwivlng  the  DC  gjoap  delay  A.  similar  approach  is  foliotvel 
here,  and  the  delay  is  alivays  \-ie\v  as  an  extra  parameter  that  can  take  any  real  x-aluet 

B.  is  shoim  hoe  that  approximate  linear  pliase  around  the  origin  can  be  diaracterbed  by 
dmple.  linear  conditions  on  the  coefficients  of  the  filter,  in  fact,  the  result  foilotvs  foom  an  abstract 
propierh'  of  functions  and  can  be  applied  to  a  varieh^  of  filters  prouded  thej^  ha\e  rcaicoetficientsi 
The  .A.LP  si'sicm  is  explicitly  soh-ol  for  di^at  FIR  filters  and  we  obtain  explidt  expressions  for 
maxtmalli'  Rat  deia^^  filters  of  any  length  and  any  DC  gjoup  delay.  These  fillers  ace  pariicnlar  cases 
of  hepergeometric  functions  and  can  be  asaxciated  with  a  variety  of  ^xecial  functions.  Thus,  rccur- 
rencese  location  of  zeroes,  integral  representations,  and  many  other  prt>p«erlies  are  available 

In  agreement  with  LhcoreiiaL  results,  these  optimal  ALP  fillers  coindde  with  those  derixed  in 
other  construdfons.  For  a  range  of  delay  x'aUies.  thee  optimal  soUitfons  can  be  obtained  by  appro¬ 
priate  transformations  of  ..^belc's  maximally  flat  dtslributed  linear  phase  filters  (See  [1 1.  [T]  or  [10. 
Section  b-SH  Thej-  also  can  be  directly  obtained  from  Thiran's  all-pole  digital  fillers  xvUh  maximally 
flat  dday[  1-1  by  rc\T3^ngihedday  sign  and  muUiplj'ingbj'- an  appropriaic  constant.  However,  ihe 
proof  presented  here  is  more  general  and  covers  all  possible  values  of  the  delay  even  those  values 
leading  to  a  singjdarsi'siem.  The  solutions  of  these  singular  scsiems  arc  also  fully  described.  These 
solutions  tndude  semmetric  poUiiomials).  ihat  is.  filters  with  exact  Linear  pitasc.  bi  this  way-  exact 
linear  pliasc  is  described  as  a  particular  cas  of  the  ALP  sraiem  that  only  arises  for  integer  or  half¬ 
integer  choices  of  the  delay  but  is  nexertheless  naturally  integrated  into  Ihe  general  frameix'ork.  of 
.•\LP  fillers. 

With  rvspect  fo  simultaneous  amplitude  and  phase  approximaiion.  it  xviU  be  shown  that  if  the 
order  of  ampiUude  approximation  is  at  most  twice  the.ALP  order  then  ihe  amplitude  approximation 
conditions  arc  also  linear  condiltons  on  the  oxeffidents  of  the  filter.  Usng  this  result  one  can  casty 
derixe  the  xvell  knoxvn  ophmaL  FTR  approximation  of  an  ideal  fractional  dday  filler  liSee  [7]  and 
rx^ercncts  ihoeirLl  Thee  optimal  fillers  arc  also  hxpergeometric  funcliona 

For  clarity,  xxe  present  first  a  brief  summarx^  of  a  program  to  approach  other  Unearphasedesigns. 
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[[R  filters  NUav  UR  des:rifiu.’>ns  can  Ih;  pWi-amol  usin^  tlvose  for  the  HR  case,  fbr  eiampfc.  for 
digitaf.  fillers  H  —  P/(?  wdiav  P  and  Q  are  RR  filters,  afiv  ooler  of  Untar  plijee  approximatioa 
for  H(;)  is  atitaialesu.  to  ihe  same  order  for 

‘Vialog  filters  Hstributed  filters  >vilh.  aitv  order  of  A.LF  can.  l^e  ohtaimsl  In'"  the  standard  bilinear 
trartdoimatioa  applied  lo  a  digital  filler  tvith.  the  same  order  of  approdmalien.  Suvee  RR  fit¬ 
ters  are  transformed  Uiia  UR  filler^  the  distributed  case  can  be  obtained  once  we  knoiv  the 
apprvpriale  s>lutions  for  b>oth  RR  and  UR  digital  filters.  The  optimal  lumped  RR  filters  arc 
Bessel  peliTOmiaUt  Thes'-  can  be  used  to  obtain  other  RR  or  UR  filters. 

\ onopttmal  filters  A  filter  ivith  any  order  of  linear  phase  approximation  can  be  expressol  as  linear 
combinaiiens  of  optimal  ALP  fiUers.  The  constants  in  the  Linear  combination  arc  free  parame- 
lers  that  can  l>c  osel  to  i(np>osc additional  propHriics. 

Simultaneous  .^mpUhide  and  Phase  Approximation  TheopttmalRRfiUersapproximalinganideal 
fractional  delay  can  be  usol  lo  dcxribeothersimultaneous  approximations  tx-hercihe  order  of 
approximation  of  the  amplitude  and  plias:  differ. 

We  no>v  point  out  some  of  the  adxantages  of  the  program  presented . 

■  The  linear  formulation  and  the  recognition  of  common  prop>crtics  for  ail  .ALP  filters  xields  a 
g)Hieral  frameivorL  for  thestudy  of  linear  pltase  properties. 

•  Optimal  .ALP  filters  are  of  interest  in  their  o>vn  rigid  but  their  properties  are  also  important  for 
the  design  of  all  other  .ALP  filters.  Reciprocally,  knotrn  construedons  can  be  rccasl  in  terms  of 
these  filters  and  precious  results  can  bn;  usel  to  further  underdand  the  properties  and  structure 
of  theopdmalcases. 

■  The  precious  points  ala>  apply  to  amuUaneous  amplitude  and  phase  approximation  procidel 
that  theorder  of  ampiiltide  approximation  is  at  moa.  hvicE  the  order  of  please  approximation. 

In  this  paper  \vc  focus  on  deriving;  the  properties  common  to  all  .ALP  fillers  and  Us  consopucnces  for 
digUal  HR  fillers.  Other  fdier  designs  cvUl  bo  discussal  dseicheru. 

The  summarx"  of  the  pxaper  is  a  foUocvs.  The  condUions  for  ALP  and  for  sUnultaneous  ampli- 
Utde  and  pliase  approximation  are  presented  ut  Section  IL  In  Section  HI  thea;  generd  resuUs  are 
^oecialired  u>  digdal  fillers  and  neassarx' condUions  for  the  fUter  magnitude  to  bo  less  than  unUy 
are  derived  Ut  terms  of  the  delay.  The  linear  scstem  for  .ALP  around  an  arbUrarv'  frequency  is  also 
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posKiU-ol.  la  Sectioa  CV  ail  ALF  HR  digjtai  fillers  arc  dcscribol  as  Uatsr  combiaatoas  of  aiaKimaiiv 
Hat  delar  fillers  for  rviijcK  expUcil  expressions  arcderix'Bl.  Integer  or  fiaif-iategtrcfioicts  of  Lhede- 
Lax'  leal  to  a  represouaiioa  ia\'0L\’ing  maKimai  and  sitameiru:  poU'iwm.ials.  1\vo  d  liferent  examples 
arc  preseniol  in  Set^ioa  V.  Slmuitaneoas  ampdiude  and  phase  approximatioa  fillers  arc  des^lbol 
as  linear  contbinalions  of  escpliclt  opiiiaai  fillers  tvliich  approximate  an  ideal  fradional  delat:  la  the 
second  example  >vc  dixus  some  properties  of  m.aKimaUy  flat  delax^  fillerst. 

Notatioa 

P  denotes  thederh'athc  ofcraiorand  xQ  tlieopHraiorr^ .  For  any  operaior  F,  its  e-th  iteratioa  is 
denotol  r"  >\-here  F"  is  the  identih'-  operator.  IVe  assu.m.e  enougii  derit-atives  for  aU.  fiirxtions  under 
consideration. 

These!  of  rcat  aumbers  is  denoted  bi'-  R  and  the  set  of  intc^icrs  WZ.  R\[<X'|  is  these!  of  p'oU'nom.i- 
ats  tvilh  real  coefficients  and  degree(dt5^  lea  or  eipuaLthaaiV.  FoUnoiaials  altvat's  occur  in  pKiitivc 
powers  of  the  variable.  IN'e  >viU  use  the  set  of  t^y  of  .■ywmivf  rnr  p'oU'nom.iali*. 

-  {F  e  R.v[,V| :  C'-Fi;!)  - 
>^,Np-lFe!^v;deg):F>-.V  and  OiO-lh. 

For  example.  is  simmetric  because  bdongs  to  ^  e\cn  thougti  it  does  not  belong  to  or  t^p- 

r{/i,  h;  c  is  the  Gauss  hi'pcrgjsom.eirU:  series  of  upper  parameiers  a  and  h,  lower  parameter  c, 
and  argument 

Bt'  a  filter  we  aUvaA's  imply  its  z-tiansform.  IVe  only  conader  teat  low -pass  fillers  and  thus  a  HR 
fUter  Is  a  pHxh'nom.lat  >vith  teat  coefficients  {f\  ]-  and  Li  ~  1  • 

The  factorial  pKiwers  and  generaiuwl  binomial  saxffidcnts  are  defmisl  for  any  complex  t  and  any 
nonnegatite  inlegjcr  k  as  cl*  —  1 , 

and(a-caA*!- 

Fbr/f  >t)>veusethenolalion[fl;l';/fl  — {al-tfftJteZ  and  0  <tt<  1^}-.  IVe  use  parenthesis 
instead  of  bradtels  to  exclude  endp^iints. 

The  symbol  is  defined  as  *•  1  if  rir  —  it,  and  *■  0  otherwise.  :  denotes  the  aimplex 
coniugaic  of  the  complex  number  c. 
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n  Conditions  for  appraxiniate  linear  phase  and  for  simultaneous  ampli¬ 
tude  and  phase  approximation 

We  firi  prtsml  an.  informal  approach,  to  the  contlU-fons  for  •■M-F.  IVrtie. 


where  the  real  fuKhonsa  and  ;>  arc  the  amplitude  and  pivasc  rtspooseol  the  filter  H  and  7IS  theC>C 
group  ddo}''. 

For  a  function  f ,  the  Tailor  expansion,  of 


1^1  ••• 

Thus.  for.T(jr)  —  HCvJ  aljc  —  /f , 

- 

f  -tj  i, 

1^1  n^t 

(2> 

where  the  real  numbers 

C3)i 

arc  the  shi/tel  moments  of  the  sopuencc  [/\  )• . 

From  (l)t  for  H  to  be.\Lp.  tvcexpiect  the  function  lotedoa;  to  a  real  function.  For 

its  imaginar}^  part  to  t'anish  up  to  a  certain  order.  indicates  that  s>mc  odd  moments  should 

\-anuh.  We  tviil  shotr  that  this  is  indeed  the  cose  and  that  —  0  for  0  <  it  <  iV  is  octuadv 

opuivalcni  to  )  -  of  )i;o>  —  0  for  0  <  it  <  fsT. 

If  the  filter  is  .-M-F.  the;  first  term  of  the  sum  in  f  4  should  somehow'  approximate  the  amplitude 
nsponat.  This  intuition  is  a^ncorrecl  and.  as  it  is  shoim  in  Corodart'  31  Hat  amplitude  around  %ro 
is  equivalent  >vith  vanishing  even  moments. 

9.nce  we  would  like  to  appl}'  our  results  to  real  filters  H  whose  frequence  responses  can  take  the 
form  HC/f),  or  HC/tanO.  we  consider  complex  t-alual  functions  /(f)  such  that  O  “  AO- 
That  i.%  wo  ask  the  real  and  Imaginar}'  p^rts  off  to  havo  etonand  odd  ammetr)'.  \Mth  the  tow-pass 
condition  fip)  —  1,  we  can  write  —  a[f  where  the  awij'flfiaft*  a(f)  is  an  ev'cn  real  function 
and  the  pff)  is  an  odd  real  function. 

CXir  hra.  resuli  is  simple  to  prove  but  it  has  far  reaching  consopuencese 
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Thwran.  I  Uj  f(,(}  Iv  j/ieiictvii  Drat  tahr?  cmqirx  !«/!«•«.  miJ  fudi  tirat  /(— 0  “  fiO  /W  “  * • 
Q'lwiffr  (ft!  rt7TRVisiiif  pw  ni  a  iieif^^rlimi  cf^—0, 

(•!> 

ii'lgrv  a  if  mi  nvi  mij  p  an  i\U  fitncjwn.  Far  ya  nvl  n  uniter  aiitfjpr  aS  cnte^er?  n,0<n  <  N  (hc/vOfU'inf; 
{piattmif  anr  eifieiiviknf 

-  rf„a,  aiaf 

Conivfniiilfy, 

—  )  Of  uf -hO.  p") 

Proof  Lfii  ftpni(4li 

InCram-lniiiCfXH-ArtO-TO-  (S|i 

Using  Lhdf  ts  an  «t31  function. 

!?=•*  “tin  FXO>  -  \f<0-r(Xaf. 

Ttvc  rvsuU  follows  from  the  lemma  in  .'\[^«aulU  A  biixaiia:  ln'Cr(0ti)  7^  A  ■ 

For  the  magnttude  of  F  in  (4^  to  Ik;  flal  around  %ro.  we  neol  tlve  ilQt\sti\'\5  of  the  function  a  to 
\-anidt  at  koo.  When  /'  h.as  .ALP,  the  next  tlicoiom  implies  that  simultaneous  phaa;  and  amplitude 
approximation  is  ojuiviaient  to  v-anLixing  even  derivatives  of  at  f  —  01 

Th.®  rem  2  Ur  f,  a,  ami  p  of  in  (4jl  aia/  <V  aiaf  iVf  a«y  /o.-rf  w  hjftTtrrs. 

/.  (f  P®"  '■'pil^  —  tAib  .for  0  <  «  <  iV,  tJrn 

U^aiOf  -  W0!i  .Hr  0<n<  .V.  (S? 

2.  (f  _  -jn^u  frr  0  <  ti  <  iVf,  tivn 

^.fCf  >KW  -  +  /P'Cf'CO  -  .for  0  <  re  <:  ivr.  ( lOl 

Proof  Write  a(0  “  f^Cf  )®iOi  whae  —  e~^,nXf  and.  biecattse  of  the  condition  on  the  phase  p, 
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for  some  function  k. 


We  haft's  ^  for  0<  <r<  2.V,  and  thuslKefiri  part  foUo>v5biecati3:forO<  «<  N^, 

r**  ^  n3*-‘  nf" 

FforiKescaiad  pari.^vriieiVr  —  iV +#, where#  isOor  1.  ^'Theorem.  1, 

-  T^a  for  0<ii<N, 


and  iheptwioua  part  impU«5i?^<a|flt|  —  #,a  for  0  <  «  <  W  or  C^'a^Oti  —  #ai  for  0  <  «  <  iVT,  becaua; 
a  is  an  e\'en  function. 

We  obtain  the  result  taking  derlt'atlt'es  in  (8^  and  apptving  the  following  conK^uei^e  of  i^-li  to 
jrtt-)  —  In  X  and  —  F(r)  or  eCt'^  —  aCf). 

#«i  far  0<k<M,tlvitiy'(ga\c^ji}—iy'\t<^)Pi^ie(^p  f(fr  0<B<lvr.  ■ 


III  Conditions  fordi^ta.1  filters 

tet/Cf^  <■  Hjp'^JbiethefrtiiueikivrvsfK'narof  a  digital  filter  H,  Wehafts. 

-  j^(xpfix~wix»o)-  r^£.,  (11) 

»vhere  are  the  moments  defined  in  (3)i  We  now  atmmaive  the  niationdtip  between  theseshiftol 

moments  and  llvcdert\'ati\es  at  r«ro  of  the  functions  a  and  p. 

Corollary  3  Ui  H  \feaficiictmt  vitir  Laitn’ik  fxpat^fific  o\i  tire  eieff  cihfr.  H(^)  —  U’lx're  {/nj- 

arvoat  HCl)—  l,aiaf  ff(r'l)i  —  J]? u’lrerra anf  pair  muf/ieiectiriL'^  jeiwi  airtf  pent/. 

Crtiasef  ffl-  0  <  B  <  ifff.foflhi'nrjraiiaftffoiesarvofeiivjlhrf. 


C.Xmplttude#  (f.''fa^i  -  Ofor  0<  «  <  .V f/se 


.for  a<B<.V. 


(Higher derivative^  (Tlifl,  ••  #^.for  0<  b  <  iVf  iJms 


Th« \'»Uje  of  iHe ddaj'  y  affecLs  ihe oviraii.  rcsfvnsc of  Lhc  filler,  [f  'vilii  W(l)  —  I, 

ihe  delaft^  y  etjuala  Liiii  i  •  Frcm  [&  Proposition,  iil.  >ve  obtain 

Proposit>on4 

(f  iMx|HCr<)|<i  tiK\i  a<r<‘V- 

This  result  is  not  e^'ulent  because  the  coefficients  In  are  not  necesarily  po^h'e  and  then  y  does 
not  nasi  to  be  Us  center  of  maa.  The  reciprocal  of  (13(1  is  not  true  as  can  be  seen  In'- choosing  any 
sn^mmeiric  pwliTiomiaL  ivith  magnitude  response  not  bounded  by  unity. 

If  the  center  of  thepassband  is  al  a  ffei^uetwy  £  (0,  trji,  a  dmple  generaltnalion  of  Theorem  1 
leads  to  the  foltoiilng  sistem  for  A.LP  around  p, 

(fc-  7^^  V*’’  —  0  for  0  <  u  <  iV. 

For  rcalcoeffidents  {/a  b  ive  hane  the  2N  ojualions. 

vdierB0<  K  <  N. 


IV  Description  of  all  approximate  linear  phase  FIR  digital  fiJters 

For  real  y  and  nonnegatho  integers  f  and  M  let 

f-,;'^-{PeR.v[h1;(A£?)i^^V'^P>l)(h-<J  for 
Lj^^-{Pe4f'^:deg:Pl-iV  and  P(l>-1)-. 

When  t  —  iV  ive  drop  thesupjrsaipt  M  as  in 

A  filler  H  in  has  .■MJ’  of  orderf .  We  MTllsee  that  theorder  f  cannot  be  greater  than  W  eKorfi 
if  H  has  exact  linear  pitasa  in  that  case  y  necessarily  belong  to  [0 :  W ;  j|.  for  y  outside  [0  :  M :  ;1 
tlioo  is  only  one  pohmomiai  in  il^-^ 

Theo  rem  f  Ur  y  a  nvf  iiitr/rtvr  amf  W  a  ij«j  tayaf itv  hsfcjjrr.  Tlnrii 


^■r 


9  r  ye[0-A:i) 

T-  U’lt/f  0  <  ff  <  M 

mi 
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uirov 


f Krf/itrniprr-  Otc  cf iKtyneniaki  of (fr;ifTW  iV  aisf  nacf  ftamr  ;Vftw  cpnsf^  a'Wf  f/ir  xt 


BSKS.V 


Ptwf  SeeAppaiilwC. 

Ih.  order  to  descrilje  the  set  vre  first  obtain.  aU.  A  e  £.!/  acid  then  ask.  for  the  notmaiisation 
A(  1 )  —  I .  For  moi  7,  if/  turns  out  to  be  a  subspacc  of  of  dimension  iV  +  1  -  f,  but  its 

description  is  dlffertsit  depending  on  ivtiether  y  lies  inside  or  outside  the  set  [0 ;  ;  i|. 

CoiolLacy  b  Ut  iV a  lafiuuyalhvhtte^craiiJyanvtiwiehTout^elO:  N ;  ij.  n^uforO  <  f  <  JV, 

61  a  ho!*!  tjf  aiijfor  ON,  if/  —  [0]- . 

Proof  Let  0  <  f  <  N^.  That  the  dimension  of  If/  is  iV  -1-  1  -  f  an  be  obtained  from  the  proof  of 
Tlvoonm  5.  Sina  are  tV  +  1  —  f  pohnomials  of  dUfaent  degrees  in  thev  ace  a  basis  for 

that  space 

\Vh«nf  >  N,  asaimeA  ji  0  In  and  let  M  <  Nbethedegjeeof  A.  CUsrly  if  and  then 
A(;^  —  \Lf  for  some  aniant  But  deghjr  —  f  >  M  and  thus  A  —  0,  a  contradiction,  it  foUoivs 
lh^£^'^-{0)-.  ■ 

We  stitl  need  to  consider  the  case  w-hen  y  belongs  to  [0 :  iV ;  il  or  simply  7  e  [0 :  •$  :  becaitsc 
ivilh  a  ”  —  1  implies 


In  the  next  coroliacyi  ive  show  that  when  (>  N  -  \  -  y.  If/  equals  .  Obsene  that  t  an  be 
arbltrarity  large  because  this  case  corrvsfxinds  to  exart  Unar  phase.  When  f  <  iV  -  equals 


the  direct  sum  of  and  the  subspaca  or  cf+i  ^  j  depending  on  whether  7  is  an 

integeror  a  had  integer. 


Corollary  7  Let  7  e  [0 ;  4  ;  ij.  Tim. 


7eli>:T:lI 

7e[i:^  :11. 
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Vhr  c? 


,Y-|-l-r  ^  f  <  /sr-  l-T, 

aid  re[a;i;lj. 


T  +  j  ^  f>iV-l-7  ati  7e[j;4;l|. 


Proof  Let  7  e  [0 :  4  :  ij.  Clearly  C  L^.  WnK  ihe  ajnMnlion.  #  —  i  if  7  is  a  haW  Uitc^r  and 
5  —  1  i/  7  is  an  integer.  \vb  haftt 


i»eca<iseCJc£^"*C*'“^*’^'^)('(^  >  ••  0  for  ail  0  <  «  <  f  and  ail  d.  e 
Also,  if  A  e  5^ 

/!{;>- and  A(:>  - 


for  some  folmomial  ff .  Therrfore  Bt”  Bt>  and  then  B  —  0  and  consenumtly  A  —  0. 

IVeclaim  that  for  any  A  e  llvae  exist  R  e  and  p  £  sudi  that  A  —  K  -h  P.  Q.M31  A 

>ve  choose  iC  lo  match.  Ihe  first  7 -(-^coefficients  of  A  and  define  P  of  degree  at  most  iV  —  ^-7siuii 
that  A  —  K  —  c'l^^P.  ftxause  of  the  conditions  on  A  and  fC,  P  £ 


R>r  the  dunenions  note  that  —  {O}-,  if  f  >  Y— #  -7,  and  that  dim  —7-1-#.  ■ 


V  Evaniples 

The  foUoicing  examples  illustrate  the  precious  descriptions.  IVe  cvill  use  a;>me  results  on  hepergeo- 
metric  functions  and  Stirling  ruunhers  [5l 

V!1  Simultaneous  amplitude  and  phase  approximation  of  an  ideal  response 

We  use  Corollarx''  3  to  construct  filters  mth  flat  acnpUiude  and  flat  group  delay  around  zero. 

C>teer\-e  that  for  am-  function  A,  the  foUocving  four  conditions,  c-alid  for  all  Jci  0  <  1:  <  N,  ace 
epuh'aicni: 


Ct-PriiA-^ACa-Wlf  -  #«. 


C16f 

07) 

(JSf 


fA-i?)PA(if  -  r. 


i.T'Ct-’/KA-Wif  - 
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\\'U.K  (1 7).  Lhc  manimaUv  can  l>c  t^Maintsl  its 


'‘““-.I.O-'''- 


(19> 


For  other  iriethotls  to  derive  tliea:  solutjtma  sec  [7,  Page-tOj  and  rdercncts  thereuL  The  poUTiomials 
A,Jf  approximaic  theidcaL  fracLionai  delaa'  fUier 


The  freipuenc)''  respvna;  of  ft  has  "optiniaL"'  flat  amplitude  and  flat  group  delaa^  around  ^ro. 
To  see  which  \'alues  of  the  delaa^  lead  to  exad  U(u:ar  phaa:.  u&e  (1  ^  to  obtain 


and  then  eiilter  t  —  ^  or  y  belong  to  [0  :  hr ;  1 J  and  .  The  case  t  ”  ^  oirrvspjnds  to  a 

particular  case  of  Herrmann's  Linear  phase  manimai  flat  ampliiude  filters  [bl. 

The  phase  and  amplitude  rvspjna:  of  /li^.  are  niatol  to  the  momettis  of  the  coefhdents  bv 


pS’t-l 


1* 


where0<  K  <  N. 

In  conclu^n.  the  first  N  deritattves  at  f  —  0  of  both  fl(f)  and  ;^0  -  7f  do  vanish,  and  the  next 
iV  derixatties  can  be  oimputsl  in  terms  of  higgler  moments  of  the  coefficients  of 

IVe  now  oxnader  the  problem  of  obtaining  a  ITR  filter  H ,  wiih  different  flat¬ 

ness  parameters  Nj.  and  Mj  (Mj  < 


—  7#,u  for  0  <  «  <  iV^,  and 
D*’afO>  -  Ibr  0  <  rr  <  iV,. 


According  to  CoroUarx"  i  the  problem  is  i3:|ui\'alent  to 


—  0  for  0  <  «  <  Nj.,  and 
.9^'  —  ^  for  0  <  K  <  hr,. 


Write 


HW-  £ 


mmt.N'rAiF 
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w+iere  At  arc  constants  to  be  <lcicnn.ineJ.  The  properties  of  xi-eW, 
flfi'  —0  for  0  <:  fj  <  m.ii\[Nj.,  iVj)-, 


arul  >vc  obtain  —  1  bt'-  anting  Lt  <V  ~  1  ■  The  atlJitionat  constrains  on  \  dq:enil  on  higher  mo¬ 
ments  of  /l,y  and  can  be  c^pri5sol  aa 
If.Vj>iVp 

•''<2/  -  £  for  N^<  IS  <  N^. 

k-V 


■>  - 

-  E  Ai-'»C^,  ft'r  N.  <  IS  <  iV;.. 


To  compute  bigjicr  momenta  >ve  can  usethe  expanaon  of  Ajl|.  around  c  —  0. 


‘N!  hXk}  y-k~ 


v\-here  TCa,  b.:  c  c)  is  the  hj-pergsomctric  series  ddinesl  in  tbe  introductioru 
Then,  it  L  is  a  nonnegatit'c  inttger. 


i;2D^ 
(21  > 

(22^ 


tvbere  5"  arc  the  Stilling  lumbers  of  the  second  kind. 

The  LHS  of  Erp-  ^  vanishes  for  aU.  7  in  [0 :  iV :  1 1  tiecauac  those  choices  of  7  lead  to  exad  linear 
pitase  and  therefore  alt  odd  moments  of  should  \-antdv  Different  choices  of  7  tvUi  signtficantlv 
imp^t  on  the  values  of  higher  moments  of  A,y  and  consepucntly  on  the  values  of  higher  derieattves 
of  its  amplitude  and  plrasc. 

[ft  Figure  1  vre  plotted  the  frcipuerk:)"  rvspona:  charaderid-ics  for  A^^  with  7—3.1  and  iV  —  S. 
Bbcaiuc  of  Cl3jL  the  t'aluc  of  7  was  chosen  in  the  intertai  [Ot  b].  For  thea:  felters  both  the  amplitude 
and  the  group  detae-  arc  flat  around  nao.  For  cLariU'-.  the  values  of  the  group  delay  in  the  pvas^and 
have  been  shifted  to  jw. 
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V.Z  FIR  iiltEre  with  nvaxiinally  flat  group  delay 

Fbr  each  y  oiiLsde  [0 :  M ;  il,  ihc  fKjhwmiats  definol  Ui  Theonem  5  harve  maccimady  flat  group 
delaA'  >viJhiii  R,v[,V|.  for  7  la  [0;  /V;  il:  ihere  are  iafiaitely  many  optimal  ALFsoUitorvi  incUiding 
A^mmaric  poUmomials  Leading  to  enact  Linear  ptiaax 

As  pvLnlol  out  in  [  121  the  poLtmomlals  arc  relalcil  to  hyp^geomctric  and  L^gertilrc  func¬ 
tions.  En  our  notation. 

-‘V;  1  -  27;  (22) 

I-  .v' 

Note  the  adtanlage  of  our  formulation  ot-er  the  one  by  Thiran.  simply  considering  a  different 

normalLnation.  L^.(  1)  —  1  as  oppisc  to  1^  (0^  ~  1,  l^C-f  becomes  ala*  a  pibrnomiaL  in  the  t-ariable  7. 
To  iUuirate  this  adt-aniagB.  we  now'  show  how  to  derive  ihe  value  of  LJ^  at  the  Nvvjuifit  ftepuency 
iVedaim 

i.;c-i >  -  •I''' -  T)C|  -  r)  ••  -1=^^  -  T). 

Rrst  note  that  when  7  bdongs  to  { t,  4 ,  •  •  • ,  } ,  I,l[- (c)  is  a  sn'mmdric  foUiiomiaL  of  decree 27  and 

then  its  value  at  —  1  is  *xro.  Thus. 

l^(-l )  -  c,v(^  -  7K^  -  r)- •  *  -  T>. 

for  some  rondant  c,v.  Cv-aluating  the  previous  ei^uation  at  7  —  0, 

1 -c,v^2'''N’Cl.3-”(2.V- 1)0: 

we  obtain  the  value  of  c,s . 

S.miLarly  to  the  case  of  simultaneous  appoximation  vve  can  ua;  linear  combinations  of  the  pK*ly- 
nomials  to  generate  hlters  wilh  any  order  of -ALP  but  atidv'ing  additional  properties. 

Ih  Figure  2  we  plotted  the  heiquetKy  respina:  characteristics  for  L^.  with  7—3.1  and  iV  —  3.  The 
acnpliiude  repina;  is  not  flat  around  2:ero  but  its  group  delay  is  doar  to  condant  when  comparvsl 
with  thedelaft'  of  A^.. 
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FhABA 


Anplltud* 


Group  d«lay  PAScbuid  group  d«lay 


Figyre  1 :  Frojuaicvcharactai^ics  of  oftijnaL  ftlicr  AJ,.,  >vith  7  —  3.1  anA  —  8. 

A0ipiit\>a«  PhAM 


Rgurei  lTec]uencycliaraKia‘iiicsof  ma\i(naiU.y  fldi delay  filio-  L^.,  with-)!-—  3.1  and  M  —  8. 
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VI  Conclusion 


A  [unv'  appn>ach.  to  live  Lheprr  atvi  design  approximate  linear  ptiasc  filicrs  has  l>ecn  prestiUoi.  [t 
is  hasol  on  nxognUing  the  existence  of  a  linear  formulation  for  pfvase  approximation  as  niii  as  for 
amulianeous  ampUiude  and  pliase  approximation. 

This  dxaraetcrination  pro\ides  linear  conditions  on  the  filter  a>efhd.ents  n-heiher  Lite  Low-pass 
filler  is  FIR.  HR  I'rational  or  non-rationalf .  analog  or  digjiaU  U  also  ado>vs  for  arhilrar)'  real  xaUies  of 
the  KT  group  delaAi 

In  this  paper  we  eeamintsi  some  ronaci^ucnces  of  this  charaderuation  for  FK  dighal  hllers.  The 
ceamptes  presented  intend  to  iUusirate  ihc  ade-antages  of  this  formulation  and  also  provide  the  ex- 
piieii  building  blocks  for  other  approximate  linear  phasedesigns. 


.appendices 

A  A  result  on  functions  with  vanishing  e^n  derivatives 
The  derixatlve  of  a  composition  can  be  computed  eia 

£rcroe)(c>-£/7C=>P'.''W=M= 

vvitere 

«!  £?*' 


Lemma  8  Asssetira  fo  ivair.o  nvf  mcm^Kr  aiJ  f aiiJ  le  fo  iv.fKiscfwit«.‘«drff*if  Dr(eCa)(l  f  0.  Vx’JifBifU’ncj: 
a<iafftt'it«arvo^eltufrtsf.S'raff  B,0  <  re  <  N, 

aicXa)-Oi 

Proof  Let  icf  —  and  a»ume  K|  —  •  ••  —  Kas_i  “  01  ^^'ilh  (23.  for  0  <  w  <  iV,  ail  terms  in 

contain  and  index  f  such  that  /i  is  odd  and  /i  <  2*V  -  l . 
the  rcciprpcaiL  thecAS  *  1  is  dear*  the  ia  true  ft  <  Lei  n  *  tV.  Then 

1^1  —  ••  •  —  lij.v-i  —0  anti 
2\*  I 

0- £?=''>V “«)(»)-  £ 

Ih  the  aim  theoiUyncin-KeiP  Lam  ct>rTC5pc*n»ls  to  t  —  1  >  Henceiis.svi "  Ohecaiia:  for  aU  w,  /’J  — 
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B  Some  properties  of  Vaadennonde  matrices 

Let  r  be  a  set  of  Rdiffennt  complex  numbers.  r~  [711,71,. and  denote. 


r*  "  (tSiTii'-'iIC-iI’  aad  ja  -t-l*-  [4711 +  [1,471 -t-K-..,  1*71-1  d-l*)-. 


for  conaaats  4  and  [•.  For  0  <  t  <  «,  let  —  L'i} tv.,*’  '’e  t^  uiiu]ue  poUtiomial  of  at  most 
degnx  t!  —  1  sticK  that 

Let  biethet2b\''ii  Vatidennondcmatrix  of  entries  —7*.  Since  I' consists  of  ndiffcraitnumtias. 
has  aft  inverse  matrix  ( T*  wdtoa:  rows  are  the  coe<ftci«ils  of 


a  can.  be  eaaiy  t'eriftel  that  for  constants  4  and  b, 


and.  if  r*  has  exactly  n  dements. 

For  r  —  [0, 1 ,  •  ■  ■ ,  K  —  1  )■  tve  simply  write  CJI^  —  C;,.i .  'Ve  have 


fx\  fa- 1- x\  _  C-1  f-*-*  f«-i\  A- 
Wl.it- l-tr^”  1.  k  }x-k 


C  Proof  of  Theorem  5 


For  a  function  f  and  a  real  number  <t 

-  a\xDfri\}.  rsji 

IVhfst  tt  —  - 1  and  f^x)  —  e  ^nF' 

(aL*/  Or’^dW  -  a-taCa-'^XI  F  -  0 

foralLfr,(}<  fc<  ZiV.  ELiuitaUmtly.  I/(A(a'F— a'*^A(a-''F)(1)  —  0,  forallfc,(j<  fc<  2iV.  FVhcnrbelon^ 
10  [0:  iV:  ij;  A[c)-  c®iA(i“*F  dapoltTiomiaLof  de^recat  most  2, V. Thus 

/U:cF-c»TAt:-V  (30 
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Since  A  has  Jsgree  N,  we  ha\'B  0  <  Z-y  -  iV  <  anJ  ihc^  fer  aii-  7  £  [0 :  -i  :  j>.  Whoi 

y  —  for  0  <  K  <  ^^,  (?1>  implies  /lC;>  —  for  some  RariprocaU-y.  a 

p.iLynom.idi  of  decree  N  and  tKacL  Linear  pluee  saii^ics  (31  ^  for  integer  27  and  0  <2y—  N  <N.  The 
last  part  of  the  theoretn.  is  pro\Tsl. 

Let  7  be  outsule  of  [0 :  :  4]  anl  {a^  ViL.  be  the  codh^icnts  of  A  in  £<1-  is  e^ui.v'alcnt  to 

the  following  Vanderm.onilesj'siem 


for0<«<?y-l 


>vhereK  -:^^i  'ii-*:-7iand  A  -  N-y. 

CVr  aesumfiion  on  7  ntsUs  7*  /  7§  if  Ir  /  fc" .  Thus,  there  is  only  one  aoLulion  [K  }■  of  that  can 

l»e  compulol  using  Eijuations  (abacs',  [f  T  —  {71} ,  for  all  Ir,  0<k<  N, 


rmdl  ,\JC*  '»r 

c,v.ti;7  -b  A>c,vai:7-  c,N><NX:,veC2r-  .V) 

c.\a(27  -  k)  C,vaC27  -  k) 


To  et-aluale C,yJ^^  vve  uae  (23)t  ft  foUoivs  thal  and 


C.v.>(2y-.V)  (,2y-Nf{2y-2kt 


C,N.ii27  -  k)  (27  -  -  *V  -  k) 


because  .y>- — (—  1  )l’(re  —  1  —  a')M-. 

Writing  back  K  in  terms  of  Si,  fbr0<  *■<  Mi 


Note  that  i33^  is  also  \'ali<l  forlc  —  M  and  that 


for  all  f,  y  [5,  Etj.  ^22^  Thus-  to  obtain  Lilii  “  1  tve  choose  On  ~  (?()  /"  ^y)  • 
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Appendix  B 


Technical  Report 

Generalized  Variable  Precision  Filters 
for  Adaptive  Computing  Applications  ^ 

Theoretical  background  for  the  implementation 
of  factored  FIR  approximation  of  HR  filters 

Gregory  Beylkin  and  Lucas  Monzon 
June  6,  1999.  Revised  October  15,  1999 


Introduction 


A  great  variety  of  digital  filters  can  be  readily  designed  as  Infinite  Impulse  Response  (HR)  filters. 
These  HR  filters  are  typically  implemented  via  recursive  algorithms  or  by  approximations  using  Finite 
Impulse  Response  (FIR)  filters.  In  the  latter  case,  we  are  interested  in  approximating  the  frequency 
response  function  of  the  HR  filter,  a  rational  function,  by  the  frequency  response  function  of  an  FIR  filter,  a 
polynomial  function.  One  standard  approach  is  to  find  the  FIR  filter  by  some  optimization  over  fixed 
degree  polynomials. 

In  our  approach  (see  [1]),  the  degree  of  the  polynomial  approximation  is  not  fixed.  Instead,  an 
efficient  and  accurate  implementation  is  achieved  by  representing  the  approximating 
solution  as  a  cascade  of  very  simple  FIR  filters.  The  degree  of  the  approximating 
polynomial  could  be  high  to  obtain  the  precision  sought  although  the  cascade 
representation  induces  a  relatively  small  number  of  operations.  The  number  of  factors  in 
the  cascade  depends  on  the  accuracy  sought  and  is  not  very  large.  Higher  accuracy  can 
be  obtained  by  adding  extra  factors  to  the  representation.  Thus,  depending  on  the  desired 
precision  on  the  filter  output,  one  can  uniquely  specify  the  number  of  factors  in  the 
cascade.  Hardware  efficiency  come  from  the  fact  that  only  the  minimum  required  factors 
are  computed. 


We  note  that  if  this  technique  is  applied  to  HR  Quadrature  Mirror  Filters,  we  obtain  a  FIR  filter 
that  satisfies  the  quadrature  mirror  condition  with  any  desir'ed  accuracy. 

Let  us  now  describe  the  FIR  approximation. 


Given  any  HR  filter  H{z) 
can  be  written  as 


P[z) 

Q{z) 


,  where  P  and  Q  are  polynomials,  the  FIR  approximation  F(z) 


(1) 

k^O 

where  each  F (k)  (z)  is  a  polynomial.  In  particular, 

P(o)iz) ^  p{z)Q{-  z) 

and,  for  k  >  0,  the  degree  of  F  (k)  (z)  is  at  most  the  degree  of  Q(z). 


‘  DARPA  grant  F30602-98-1-0154 
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Implementation  reducing  memory  requirements 

Even  though  each  factor  F(k)(z)  in  (1)  is  itself  a  cascade  of  3-tap  FIR  filters,  there  are  large  delays 

in  the  factors  )  which  significantly  increases  the  cost  of  memory.  Nevertheless,  if  the  output  of 

applying  the  filter  F(z)  is  bandlimited,  we  can  use  a  subsampled  version  of  F(z)  which  drastically  reduces 
the  memory  cost. 

Subsampling  design 

Let  X(z)  be  the  z-transform  of  a  sequence  {xk},  i.e. 

(2) 

ksZ 


Let  us  denote  by  Xq,  Xi  the  polyphase  components  of  X(z), 

Ir 

(3) 

K 

(4) 

k 


The  operator  2?  stands  for  subsampling  by  two,  that  is,  it  does  retain  only  the  even  entries  of  the 
sequence: 

2?:{xkj?  {X2kl- 

We  want  to  apply  a  filter  A(z^)  to  a  signal  X(z)  and  then  subsample  the  result.  Let  us  call  Y(z)  the 
result  of  these  two  operations: 

Y(z)  =  2?(A(i)X(z)). 

We  note  that  it  is  possible  to  reverse  the  order  of  the  operations.  Namely,  we  can  first  subsample 
the  signal  and  then  apply  the  filter  A(z), 

2?(A(z^)X(z))=A(z)(2?X(z)),  (5) 

where  the  order  of  operations  is  indicated  by  parenthesis. 

We  now  show  how  this  observation  is  applicable  to  FIR  cascades. 


Implementation  of  subsampled  factored  FIR  approximation 

If  the  output  Y(z),  Y(z)  =  F(z)X(z)  is  band-limited  then,  it  can  be  subsampled.  To  illustrate  the 
situation  suppose  that  Y  is  subsampled  by  a  factor  of  eight  and  that  the  filter  F  consists  of  five  factors,  that 
is  n  =  4  in  equation  (1). 

We  can  compute  Y  ,  the  subsampled  version  of  Y,  using  the  discussion  above.  Each  step  of 
subsampling  changes  the  application  of  a  filter  )  to  application  of  )• 

In  this  example,  we  obtain  (operations  are  applied  from  left  to  right) 

f  U)=2  4. 24. 2  4.  «>> 

=  fiofc’Kilzh  I  F(2)(z)2  I  F(,)(d2  4.  (7) 

Clearly,  there  is  a  substantial  improvement  in  using  (7)  instead  of  (6)  for  the  computation  of  Y  . 
After  applying  F(o) ,  at  each  stage  the  result  is  subsampled  by  a  factor  of  two  with  the  corresponding  savings 
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in  memory.  The  memory  requirement  is  dictated  by  the  power  of  z  in  the  argument  of  F(]j)  and  that  power  is 
reduced  in  all  factors. 

If  F  is  the  answer  we  are  seeking,  the  computations  can  be  stopped  at  this  stage.  If  instead  we 
are  interested  in  Y  ,  we  can  first  compute  Y  and  then  upsample  it  to  obtain  Y. 

We  now  discuss  further  improvements. 

Faster  implementation  of  some  factors 

By  construction,  the  higher  the  value  of  k,  the  smaller  the  length  of  F(k)(z)  and  also  the  smaller  the 
absolute  value  of  some  of  its  coefficients.  This  indicates  that  an  additional  saving  can  be  achieved  if  we 
expand  these  factors  F(ic)  to  obtain  their  coefficients  rather  than  to  keep  them  in  a  factored  form.  For  those 
factors  we  discard  all  coefficients  below  the  target  accuracy.  This  direct  implementation  is  faster  than 
applying  them  as  a  cascade. 

Other  possible  subsampling 

The  procedure  described  above  is  not  limited  to  the  particular  way  in  which  subsampling  is 
achieved  in  (7).  Any  linear  combination  of  the  polyphase  components  could  be  used.  Specifically,  with 
the  notation  of  (3)  and  (4),  any  operator  of  the  form 

X(z)^i?(z)Xo(z)  +  5(z)x,(z)  (8) 

where  R  and  S  are  any  function,  can  be  used  instead  of  2?  .  For  example,  if  we  use  the  Haar  filter 

1  +  Z 

- to  decompose  a  signal  into  its  low  and  high  components,  the  low  component  corresponds  to 

2 

choosing  R(z)  =  S(z)  =1/2  in  (8). 


[1]  G.  BEYLKIN,  On  factored  FIR  approximation  of  HR  filters.  Applied  and  Computational  Harmonic 
Analysis,  2,  pp.  293-298,  1995. 
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Appendix  C 

Filter  Design  Tool  and  Architecture 


1.  Design  Methodology 

The  design  process  (fig.  1)  starts  with  a  fiiter  specification.  The  fiiter  specification  depends  on  the 
appiication  and  typicaiiy  inciudes  a  frequency  response  characteristic  and  desired  windowing  method. 
The  optimization  process  is  based  on  DSPCanvas.  The  optimization  constraints  are  the  SNR  (signai  to 
quantization  noise),  the  passband  rippie,  and  the  stop  band  attenuation.  The  target  technoiogy 
determines  the  hardware  cost  to  impiement  the  fiiter.  The  fiiter  is  then  optimized  for  finite  precision 
arithmetic,  using  cost  functions  specificaiiy  geared  towards  optimizing  for  the  target  technoiogy.  A 
VHDL  generator  is  avaiiabie  to  generate  the  fiiter  automaticaiiy  using  the  optimized  finite  precision 
resuits.  A  seamiess  fiow  is  avaiiabie  to  put  the  fiiter  on  hardware. 


Figure  1  Design  Fiow 


2.  FIR  Filter  Design 

The  reiationship  between  the  FIR  filter’s  input  sequence  x(n)  and  its  output  y(n)  is 


A^-l 

y[n)^Y^b^x[n-k) 

jt=0 
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with  N  representing  the  number  of  coefficients  in  the  filter.  A  structure  directly  derived  from  this 
relationship  is  shown  in  Figure  where  the  in-line  triangles  represent  a  multiplication  by  a  coefficient  bk- 
Choosing  these  coefficient  values  is  accomplished  through  standard  filter  design  techniques. 


Figure  2  FIR  Filter 

There  are  several  ways  of  designing  and  implementing  a  FIR  filter  in  DSP  Canvas.  One  convenient 
method  is  to  use  the  Matlab  engine  interface  via  a  Matlab  model  in  a  DSP  Canvas  schematic.  Using 
this  model,  one  can  specify  a  desired  piecewise  linear  frequency  response,  type  of  filter,  and  the 
number  of  filter  coefficients,  as  illustrated  in  Figure  .  Alternatively,  a  filter  may  be  designed  outside  of 
DSP  Canvas,  and  DSP  Canvas  simply  loads  a  vector  of  coefficients  from  a  data  file  into  a  standard  FIR 
filter  model. 


Figure  3  Filter  specification  for  Matlab 

3.  Optimization  Strategy 

The  goal  of  the  optimization  process  is  to  find  the  lowest  cost  solution  that  meets  the  fiter  design 
constraints.  The  three  components  of  the  filter  optimization  system  are: 

•  Filter  and  architecture  parameters 

•  Filter  design  constraints 

•  Cost  function 

In  general,  the  filter  parameters  (such  as  coefficient  quantization  width)  determine  the  architectural 
parameters  (such  as  datapath  precision),  which  results  in  a  certain  implementation  cost.  As  the  filter 
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parameters  are  varied,  the  filter  constraints  must  be  checked  to  ensure  that  the  filter  satisfies  its  design 
specification. 

3.1  FIR  Filter  Parameters 

There  are  four  classes  of  parameters  for  FIR  filters: 

•  Input  encoding 

•  Coefficient  encoding 

•  Accumulator  encoding 

•  Output  Encoding 

For  the  multiply-accumulate  operations  in  Figure  ,  consider  a  2’s  complement.  In  DSP  Canvas,  the 
precision  of  2’s  complement  numbers  is  represented  as  the  pair  <w,d>,  where: 

<w>  is  the  total  number  of  bits 

d  defines  the  location  of  binary  point  at  <d>  bits  to  the  left  of  the  LSB  i.e.  the  LSB  has  a  value 

2-d 

Using  this  definition,  the  smallest  non-zero  value  in  magnitude  is  ,  the  greatest  positive  value  is 
(2“^'  -l)-  2~‘^ ,  and  the  greatest  negative  value  is  -2"'“'  •  2“"' . 

Assume  that  the  coefficients  are  encoded  using  canonical  signed  digit  (CSD),  where  each  digit 
may  be  a  1 , 0,  or  -1 .  For  example,  the  2’s  complement  representation  of  decimal  7  is  0111 ,  whereas 
the  CSD  representation  is  100  1  (-1  is  represented  by  the  symbol  1).  Note  that  a  CSD-encoded 
number  evaluates  to  the  same  value  as  a  2’s  complement  encoded  number. 

In  DSP  Canvas,  CSD  numbers  are  represented  as  <w,d,nz>.  The  third  parameter  nz 
represents  the  maximum  number  of  non-zero  digits  allotted  for  a  number,  and  its  range  is  limited  to 

^  w  w 

(J<  nz<  — .  Note  that  there  is  some  degree  of  quantization  if  nz  <  — . 

2  2 

Thus,  in  total  there  are  5  parameters  that  can  be  varied  while  satisfying  the  filter  specifications: 

Coefficient  encoding:  w,  d,  and  nz,  i.e.  coefficient<w,d,  nz> 

Accumulator  encoding:  w  and  d,  i.e.  accumulator<w,  d> 

The  dialog  box  in  Figure  illustrates  the  entry  of  these  five  parameters  in  DSPCanvas.  The  designer  can 
manually  change  these  parameters  to  optimize  them  or  use  the  scripts  described  in  this  note  to 
automatically  search  for  optimal  values. 
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Figure  4  Filter  Optimization  Parameters 

Lastly,  the  encoding  of  the  input  and  output  values  depends  on  the  system  requirements  of  the 
application  (e.g.  A/D  resolution),  and  thus  are  not  varied  by  the  optimization  script. 

3.2  Architectural  Parameters 

This  project  uses  a  shift -add  accumulate  architecture  (fig.  4)  for  the  FIR  filter. 


Figure  5  Shift- Add  Architecture 

The  memory  is  used  to  store  input  samples  of  the  filter.  The  width  of  the  memory  is  determined  by  the 
input  precision,  whereas  the  width  of  the  datapath  (i.e.  shifter,  accumulator  and  register)  is  determined 
by  the  coefficient  precision  and  input  precision.  While  the  ideal  datapath  precision  is  the  sum  of  the 
coefficient  width  and  input  width,  a  lower  precision  may  satisfy  the  specifications  and  can  be  obtained 
using  the  scripts  described  in  this  note. 
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In  a  shift -and-add  architecture,  each  non-zero  digit  in  a  coefficient  requires  one  cycie  of  processing. 
Thus,  whiie  the  CSD  encoding  of  coefficients  introduces  quantization  error  into  the  fiiter,  it  reduces  the 
number  of  cycies  (cycie  count)  required  to  compute  each  sampie.  Minimizing  the  maximum  number  of 
non-zero  bits  aiiows  for  a  convenient  tradeoff  between  the  cycie  count  and  the  frequency  response. 

The  two  architecturai  parameters,  nameiy  datapath  width  and  cycie  count,  are  necessariiy  a  function  of 
the  fiiter  parameters.  This  function  defines  the  cost  of  the  fiiter  as  discussed  beiow. 

3.3  Cost  Function 

The  goai  of  optimization  is  to  minimize  a  particuiar  design’s  “cost”,  which  is  provided  by  a  cost  function. 
A  cost  function  appropriate  for  the  shift -and-add  architecture  is 


architectu  re  cost  =  memory  cost  -I-  arithmetic  cost  x 


X  cycle  count 
system  frequency 


where  “memory  cost”  is  the  cost  of  memory,  “arithmetic  cost”  is  the  cost  of  one  shift -add  datapath  in 
Figure  .  The  ceiiing  operation  provides  an  integer  number  representing  the  number  of  paraiiei 
datapaths  required  to  meet  the  fiiter’s  sampie  rate  requirement,  given  the  underiying  technoiogy’s 
circuit  speed,  i  is  the  desired  sampie  rate,  “cycie  count”  is  number  of  shift-add  cycies  required,  and 
“system  frequency”  is  the  ciock  rate  of  the  datapath  (i.e.,  number  of  cycies  executed  in  one  second). 
These  muitipie  datapaths  share  one  memory  unit,  as  is  refiected  in  the  cost  function.  The  iiiustration  in 
Figure  shows  the  dependencies  of  the  cost  on  the  architecturai  parameters  (cycie  count  and  datapath 
width),  and  in  turn  their  dependence  on  the  fiiter  parameters  (w,  d,  nz).  The  fiiter  parameters  are 
determined  by  the  specifications.  The  cost  function  can  be  symboiicaiiy  specified  in  DSPCanvas  or  in  a 
C  program  (depending  on  the  compiexity)  as  discussed  in  another  section  of  the  tutoriai  document. 


Figure  6  Cost  calculation  for  FIR  structure 

3.4  Filter  Constraints 

There  are  severai  specifications  from  traditionai  FIR  filter  design  that  are  used  as  constraints  in  the 
optimization  process.  Clearly  one  would  like  to  ensure  that  the  filter  specifications  are  satisfied  when 
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optimizing  the  design.  Refer  to  the  frequency  response  curve  in  Figure  ,  typicai  for  a  iow-pass  fiiter. 
The  fiiter’s  passband  extends  from  0  to  fi  Hz,  whereas  the  stopband  begins  at  f2  Hz. 


Figure  7  Filter  Constraints 

The  rippie  in  the  passband  is  measured  to  be  5i  dB,  and  the  fiiter’s  attenuation  at  f2  Hz  is  §2  dB.  We 
use  5i  and  82  as  constraints  in  the  optimization  script.  Aithough  simpiified  for  some  fiiter  appiications, 
using  these  two  specifications  as  design  constraints  is  appiicabie  for  most  iow  and  high-pass  fiiters. 

Furthermore,  there  is  a  reiationship  between  the  accumuiator  bit-width  and  the  quaiity  of  the  fiiter’s 
response.  As  the  precision  of  the  accumuiator  in  a  shift -and-add  architecture  is  reduced,  the  response 
deviates  from  a  pureiy  fioating-point  system;  this  deviation  is  modeied  as  noise  in  the  system  response. 
Thus,  when  a  finite  precision  accumuiator  is  introduced,  we  measure  the  signai-to-noise  ratio  (SNR)  of 
the  finite  precision  fiiter  response  ytinite  reiative  to  a  fioating-point  design  y^ef : 

E[ylf) 

SNR  =  lOlog  -H - 

Figure  8  SNR  Speclclatlon 

The  next  section  describes  how  the  frequency  specifications  and  SNR  constraints  are  described  in  the 
optimization  scripts  in  DSPCanvas 

4.  Optimization  Tool  in  DSP  Canvas 

The  five  optimization  variabies  noted  in  Figure  create  a  five-dimensionai  space,  over  which  one  is  to 
find  a  minimum  cost  soiution.  In  order  to  reduce  the  search  time,  we  adopt  a  two-step  optimization 
strategy,  taking  advantage  of  certain  theoreticai  facts  to  partition  the  search.  The  fiiter  specifications  of 
passband  rippie  and  stopband  attenuation  are  affected  oniy  by  how  the  coefficients  are  encoded, 
whereas  the  SNR  is  affected  by  the  accumuiator. 

The  optimization  script  is  shown  in  Figure  ,  where  there  are  two  muitivariate  optimization  ioops. 


34 


In  the  first  loop  of  the  optimization  script,  the  coefficient  parameters  are  varied,  while  the  accumulator 
remains  floating  point;  each  time  the  coefficient’s  encoding  is  changed,  the  filter’s  passband  ripple  and 
stopband  attenuation  are  checked  to  ensure  they  meet  specifications.  Once  minimum  cost  coefficients 
are  discovered,  the  script  advances  to  the  second  loop  where  the  accumulator  precision  is  varied  and 
the  SNR  is  measured  relative  to  the  original  floating  point  design.  The  two  loops  are  described  below. 


Figure  9  Optimization  script 

4.1  Fixed  Point  Coefficient  Optimization 

The  first  loop  of  the  script  repeatedly  simulates  the  filter  for  different  combinations  of  coefficient 
encoding  and  checks  whether  the  frequency  response  satisfies  the  specifications,  while  calculating  the 
cost. 

The  variables  (fig.  10)  ripple  and  atten  correspond  to  the  filter  specifications  of  5i  and  82, 
respectively.  These  variables  are  used  in  the  optimization  block  as  constraints,  as  is  illustrated  in 
Figure  .  The  optimization  script  uses  the  search  space  defined  by  the  ranges  of  the  loop  variables.  In 
Figure  these  are  listed  as  b_f  ir,  b_fir_prec,  and  b_fir_nz;  these  variables  correspond  to  the 
three  parameters  in  coefficient<w,d,nz>.  Note  that  the  cost  function  is  specified  as  an 
externally  defined  function.  In  this  case  it  is  a  C  function  named  “cost_fir”,  which  takes  the  filter 
parameters  from  the  command-line  input,  formulates  the  architectural  parameters  of  datapath_width 
and  cycle_count,  and  returns  an  architectural  cost. 
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Figure  10  Optimization  Dialog  Box 

The  simulation  schematic  used  in  the  loop  for  coefficient  optimization  is  shown  in  Figure  11.  As 
illustrated,  the  impulse  response  of  the  filter  is  being  measured.  The  parameters 
coef  ficient<w,  d,  nz>  are  varied  and  passed  to  this  schematic  from  the  optimization  script;  note 
that  the  input  and  accumulator  of  the  filter  are  specified  to  be  floating  point.  Once  simulated,  the 
frequency  and  phase  response  are  stored  in  data  files.  The  optimization  script  uses  the  “set”  block  to 
calculate  the  resulting  passband  ripple  and  stopband  attenuation  to  check  if  the  constraints  are 
satisfied.  To  do  this,  the  “set”  block  in  the  script  reads  the  frequency  response  data  files  (using  the 
”get_col_op”  function)  and  performs  the  following  computations  using  “jet_set_eval”  commands: 

wl=floor (fft_length*f 1) 


w2=ceil (f ft_length*f 2+1 ) 
max_rippie=max  {filter  response  in  passband) 
min_rippie=inin  {filter  response  in  passband) 
r ipple=max_r ipple -min_r ipple 
atten_sb=  {filter  response  at  w2) 
at ten=atten_sb-min_r ipple 
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Figure  11  Optimization  Schematic 

4.2  Fixed  Point  Accumuiator  Optimization 

Once  the  first  loop  has  finished,  the  best  combination  of  coefficient<w,  ci,nz>  is  fixed  and  the 
script  moves  on  to  the  second  loop  (the  bottom  half  of  the  script  in  Figure  ). 


Figure  12  SNR  Optimization  Dalog  Box 
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The  optimization  settings  for  this  ioop  are  iiiustrated  in  Figure  .  Note  that  the  accumuiator  encoding 
settings  are  now  the  ioop  variabies,  and  the  SNR  is  the  constraint.  To  caicuiate  the  SNR,  the  second 
ioop  simuiates  the  schemat  ic  in  Figure  . 

The  strategy  of  this  schematic  is  to  investigate  finite  precision  effects  by  comparing  the  performance  of 
a  fiiter  with  a  fixed-point  datapath  versus  a  fiiter  with  a  fioating  point  datapath.  As  iiiustrated,  there  are 
two  fiiters  being  used.  The  bottom  fiiter  is  a  compiete  fixed-point  FIR  fiiter,  where  the 
coef  ficient<w,  d,  nz>  encoding  is  the  determined  in  the  first  ioop,  and  the  accumuiator  encoding  is 
varied  by  the  second  ioop.  Furthermore,  the  input  and  output  precisions  of  the  fiiter  are  specified.  The 
FIR  filter  in  the  upper  portion  of  the  figure  differs  only  in  that  the  accumulator  is  floating  point;  this  is 
termed  the  “reference”  filter.  A  single  sinusoid  centered  in  the  filter’s  passband  is  used  as  input 
stimulus  to  the  two  filters,  and  the  resulting  SNR  is  measured  and  passed  back  to  the  optimization 
script.  This  value  is  used  as  the  SNR  constraint  in  Figure  .  As  in  the  first  loop,  the  external  cost 
function  is  called  to  provide  the  architectural  cost  of  the  system. 
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Figure  13  SNR  Optimization  schematic 

Once  the  second  loop  has  finished,  optimal  values  of  all  parameters  are  determined  and  the 
optimization  script  terminates. 

5.  Factored  FIR  filter  design 

A  factored  FIR  approximation  of  NR  is  an  alternative  implementation  of  an  MR  filter.  A  factored 
FIR  approximation  for  the  NR  filter  described  above  is  generated  using  the  Colorado  design 
software.  This  single  stage  filter  contains  40  factors  (maximum  delay  2048).  NR  filters  are 
inherently  unstable  in  fixed  precision  mode.  This  is  a  method  to  stabilize  the  NR  filter  with  very 
little  extra  cost,  at  the  same  time,  preserving  the  sharp  cutoff  features  of  the  NR  filter. 

The  factored  FIR  filter  has  a  transfer  function  as  follows: 

H{z)  =  ^(T)n  ) 

i 

Figure  15  shows  the  strueture  for  the  faetored  FIR  filter. 
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P(z)  has  the  same  parameters  as  a  conventional  Fir  filter.  However,  the  cascaded  factors  present 
additional  parameters  for  optimization.  The  factored  FIR  filter  has  a  very  high  cost  of  memory.  Even 
though  the  arithmetic  units  are  reduced  the  memory  cost  shoots  up.  Each  of  the  factors  are  essentially 
a  3-tap  FIR  filter,  but  the  delays  between  the  taps  are  variable.  Pi  and  Qi  are  generally  much  greater 
than  1.  This  essentially  increases  the  cost  of  memory,  especially  when  using  FPGA  architectures  that 
do  not  have  dedicated  memory  elements  like  Xilinx  4000.  However,  architectures  such  as  the  Xilinx 
Virtex  devices  may  be  much  more  efficient,  since  they  have  embedded  memory  elements.  Figure  14 
shows  the  cost  calculation  for  the  factored  FIR  structure. 


factor  i  factor  i+1 


Figure  15  Factored  FIR  structure 
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5.1  Sub-sampled  factored  FIR 

The  factored  FIR  filter  has  a  very  high  cost  of  memory.  To  reduce  the  cost  of  memory  the  sub-sampled 
factored  FIR  filter(fig.  16)  was  developed  on  this  project.  This  preserves  all  the  good  features  of  the 
factored  FIR  filter,  at  the  same  time  reducing  the  cost  of  memory. 


Figure  16  Sub-sampled  Factored  FIR  Strcuture 

An  additional  degree  of  freedom  in  optimization  is  the  number  of  factors  used.  As  an  example,  if  a 
single  stage  requires  40  factors  with  maximum  storage  of  512,  then  a  sub-sampling  multistage  might 
require,  in  the  worst-case,  24  factors  with  maximum  storage  of  128.  Factors  with  large  storage  have 
small  coefficients  and  the  optimization  tool  exploits  this  by  dropping  factors  in  order  of  increasing  value 
of  coefficients. 

5.2  COST  COMPARISION  PLOTS 

As  shown  in  figures  17-22,  the  sub-sampled  faetored  FIR  filter  design,  developed  on  this 
project,  offers  a  significant  cost  reduction  (fig.  19),  while  providing  superior  performance 
(figures  21-22). 
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Figure  17  Sub-sampled  Factored  FIR  requires  lowest  coefficient  precision 
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Figure  18  Subsampled  Filter  has  superior  SNR  performance 
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Figure  19  Subsampled  Factored  FIR  offers  lowest  cost 
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Figure  20  Cost  Break-down 
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Figure  21  Frequency  response  comparison 


Subsampled  Factored  FIR  (0.18  dB) 


Figure  22:  Optimized  fixed- point  design  versus  Matlab  floating  point  design 
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6.  Computational  Architecture  Design  for  FPGA 


This  filter  architecture  (figures  23  and  24)  is  composed  of  storage  elements  implemented  with  dual  port 
rams,  the  product  of  the  coefficients,  implemented  by  barrel  shifters  and  an  accumulator,  which 
implements  the  addition.  The  filter  receives  samples  at  a  rate  of  fs  which  is  a  multiple  of  the  system 
clock.  Given  the  amount  of  computations  that  need  to  be  performed,  certain  degree  of  parallelism  is 
necessary  in  order  to  achieve  the  desired  sample  rate. 


Figure  23  FIR  Structure 


Figure  24  RAM  storage  architecture 


The  RAM(figure  25)  is  used  for  the  storage  of  the  input  data  samples.  The  RAM  used  is  dual  port 
synchronous.  The  RAM  is  16  words  deep,  and  the  width  is  the  same  as  the  input  sample  word.  The 
ports  for  the  RAM  are  write  enable  (WE),  write  address  (A_W),  read  address  (A_R),  clock  (CLK),  input 
data  (DJN),  and  output  data  (D_OUT). 
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Figure  25  RAM  Architecture 

A  multiplication  can  be  implemented  on  binary  numbers  by  shifting  the  binary  number,  multiplicand,  by 
the  amount  of  the  multiplier.  This  is  implemented  by  the  barrel  shifter,  which  takes  a  control  binary 
number  that  indicates  the  amount  of  the  shift.  The  inputs  of  the  barrel  shifter  are  the  binary  samples 
input  to  the  filter.  The  implementation  of  the  barrel  shifter  is  performed  with  rows  of  multiplexors  (figure 
26).  The  multiplexors  are  4  to  1  and  each  row  performs  a  0  to  3  shift. 


Figure  26  Barrel  Shifter 
Shifter  based  Transpose  form: 

The  transpose  form  of  the  FIR  filter  based  on  shifts  and  add/sub  also  uses  the  CSD  notation  to 
minimize  the  number  of  add/subs.  For  smaller  filters  (fewer  taps)  and  for  system_clock/sample_clock 
ratios  that  are  small,  this  architecture  could  perform  better. 

7.  Storage  Architecture  Design 

1.  Arithmetic  Units  with  RAM  for  storage  of  input  words. 

The  RAM  used  in  this  architecture  is  a  dual  port  RAM  as  shown  below: 
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Memory  and  control  logic  cost  calculation  : 

System  clock  to  sample  clock  rate  Fsys/Fs  <=  1 6 

Number  of  arithmetic  units  depends  on  the  number  of  non-zero  operations  and  on  Fsys/Fs  :  nzops  * 

Fs/Fsys 

For  each  RAM  the  cost  of  control  logic  is: 

4  bit  counter  for  address  read  :  2  CLB’s  @  Fsys 

4  bit  counter  for  address  write  :  2  CLB’s  @  Fs 

initial  counter  value  :  2  +  2  CLB's  @  Fsys 

final  counter  value  :  2  +  2  CLB's  @  Fsys 

total  for  each  AU :  12  CLB’s 

2.  Use  of  registers  for  storage  of  input  words. 

In  this  architecture  (shown  below)  we  store  the  input  words  on  shift  registers.  Each  arithmetic  unit 

accesses  the  relevant  input  sample  using  multiplexors. 
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•  Cost  of  Memory 

Each  input  is  stored  in  one  register  bank.  Each  input  needs  to  store  N  bits,  corresponding  to  the  width 
of  the  input  word.  Two  bits  can  be  stored  in  one  CLB. 


Cost  for  one  tap  (CLBs):  ceil  [(input  word  width)  /  2] 

Totai  cost  of  memory:  (number  of  taps)  *  {ceii  [(input  word  width)  /  2]} 


•  Cost  of  control  logic: 

The  controi  iogic  needs  to  seiect  the  appropriate  sampie  for  each  of  the  arithmetic  units.  The 
architecture  of  this  controi  iogic  is  impiemented  with  a  series  of  muitipiexors  (figure  27). 


Number  of  Mux’s  per  AU  depends  on  number  of  samples  accessed. 

Each  CLB  can  hold  up  to  one  4: 1  Mux. 

Average  number  of  samples  accessed/ AU  =  (#  taps  *  Fsys)/(Fs  *  nzops) 
B=ceil[(tap*Fsys)/(4*nzops*Fs)] 


Taps/AU 

Mux’s/AU 

1 

Wired,  0 

<=4 

B 

<=16 

B  +  1 

<=  28 

B  +  2 

<=  40 

B  +  3 

Table  1 


Control 

Counter:  Need  only  one  counter  for  all  AUs 
ceil  [lo^(Fsys/Fs)/2] 
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Controls  for  multiplexors: 

Each  4:1  Mux  needs  2  control  signals:  ICLB 
For  each  AU  need:  ceil  [Fsys/(16*Fs)] 

Total  cost  of  control:  ceil  [Fsys/(16*Fs)]  *  (number  of  Mux’s) 

•  Total  Cost: 


Total  _  cost  -  [Memory  _cost]  +  [(Mux  /  AU)  •  (AU)]  + [counter]  + 


controls  Mux 

- AU 

Mux  AU 


Total 


cos  t  —  ceil 


tap  ■  inword 


+ 


ceil 


^  tap  ■  Fsys  ^ 
A  mops  ■  Fs 


+ 


^0^ 

1 


vM). 


^  Fs  ■  nzops  '' 

■  1  -1-  ce//j 

^  Fsys  3 

^  Fsys  ^ 

[16.F.JJ 

+  ceil  ■ 
V 


log  2  {Fsys  /  Fs) 
2 


J 


where, 


"0" 

1 

2 

M 


V  / 

samples/AU 


represents  the  number  of  multiplexors  for  the  corresponding  number  of 


as  shown  in  Table  1. 


Figure  27  Multiplexor  architecture  to  select  coefficients 

7.1  Architecture  Comparison  Piots 

To  select  the  optimial  storage  architecture  the  cost  of  each  srcuture  was  plotted  as  shown 
below  (figures  28-38)  for  various  fixed  precision  parameter  values.  Each  figure  has  four 
plots: 

Upper  Left  Plot  (UL): 
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Shows  the  cost  of  memory  and  associated  logic  for  RAM  based  architecture  and 
Register  based  architecture.  The  red  line  is  for  the  RAM  and  blue  is  for  the  register-based 
architecture. 

Upper  Right  Plot  (UR): 

Shows  the  total  cost  of  memory,  control  logic  and  arithmetic  unit  (accumulator), 
for  the  RAM  and  register  architectures. 

Bottom  Left  Plot  (BL): 

Shows  the  cost  of  registers  and  access  logic.  For  this  plot  blue  for  1  non- zero  ops/tap, 
green  for  2  non- zero  ops/tap,  red  3  non- zero  ops/tap,  and  light  blue  4  non- zero  ops/tap. 

Bottom  Right  Plot  (BR): 

Shows  the  total  cost  for  all  3  architectures.  The  RAM  based  in  red,  the  register 
based  in  blue  and  the  transpose  form  in  light  blue  asterisk  lines. 


Notes: 

The  plots  for  the  RAM  configuration  show  an  increase  in  discrete  steps.  These  steps 
correspond  to  the  frequency  ratio  fr  =  fsystem/fiampie-  In  1  sample  clock  cycle,  we  can 
operate  on  fr  non- zero  bits  of  the  coefficients.  If  the  ratio  increases,  we  will  need 
fewer  RAMS.  If  the  ratio  decreases  we  will  need  more  RAMS.  On  each  RAM  we  can 
buffer  16  input  samples.  As  the  ratio  increases  the  discretization  increases  linearly. 


For  the  plot  displaying  the  register  configuration  we  notice  that  with  an  increase  of 
the  number  of  non- zero  operations  the  curves  do  not  increase  linearly.  This  is  due  to 
the  discretization  performed  by  ceil(A)  operation,  which  rounds  up  the  value  A  to  the 
next  integer.  From  the  cost  function  for  registers  we  have  the  term 


ceil 


tap  ■  Fsys 
A  mops  ■  Fs 


\  f  T7  \ 

PS  -  mops 


Fsys 


,  which  accounts  for  this  trend.  For  a  fixed  tap  value. 


V  — y  V 

a  fixed  frequency  ratio,  and  we  vary  the  number  of  non- zero  operations  we  obtain  the 
following  results. 


Cost 

CLB 

A 


Fsys/fs=5 


Non-zero 

ops 


Fsys/fs=10 
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The  CLB  cost  of  the  AU  is  greater  than  the  cost  of  memory.  The  effect  is  seen  when 
we  compare  plot2  and  plotS.  The  cost  in  plot2  is  not  linear  as  explained  on  the 
previous  note.  The  cost  in  plotS  corresponding  to  the  register  configuration  increases 
linearly  given  that  the  cost  of  the  AU  increases  linearly  with  the  number  of  non- zero 
operations/tap,  and  this  increase  is  much  greater  that  the  cost  of  memory. 

Figures  28-38  show  the  same  plots  for  different  values  of  Fs/Fclk,  input  word  width, 
accumulator  width(bit  precision): 


Figure 

Fr  =  Fs/Fclk 
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N  z  ops  per  tap  —  ram  &  reg  mem  T otal  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=15  inword=16  data  pathB24  Fclk/fs=15  inword=16  data  path=24 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
Fclk/fSBl5  inwords16  data  athB24 
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Figure  28  Storage  Cost  Comparison 


Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fsa10  inwordsl6  data  pathB24  Fclk/fsalO  inword>16  data  path>24 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
Fclk/fs=10  inword=16  data  ath=24 

P 


Figure  29  Storage  Cost  Comparison 
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Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=2  inword=16  data  path=24  x  10^  Fclk/fs=2  inword=16  data  path=24 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
4  Fclk/fs=2  inword=16  data  ath=24 
X 10  P 


T otal  cost  for  nz  ops  per  tap  —  ram,  register,  transpose 
X  io'^Pclk/Fs=2,  inword=16,  data  path=24 


Figure  30  Storage  Cost  Comparison 
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Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fcll</fs=15  inword=16  data  path=32  Fcll</fs=15  inword=16  data  path=32 


Cost  of  mem.  non-zero  operations  per  tap  -  for  register  arohitecture  transpose 

Fclkrts.15  ,nword=16  data^ath-SZ  ^  io‘^clk/Fs=^5,  inword=16,  daia  path=32 


Figure  31  Storage  Cost  Comparison 


Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=10  inwordsIG  data  pa'Hi=32  Fclk/fs=10  inwordsIS  data  path=32 


Cost  of  mem.  non-zero  operations  per  tap  -  for  register  architecture  __  ^  ,5,^  transpose 

FolWfs.lO  inword.lS  datapath.32  ^  ,o‘^="'/Fs-10Jnwt;rd.l6,  date  path-k 


Figure  32  Storage  Cost  Comparison 
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Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=15  inword=8  data  path=16  Fclk/fs=15  inword=8  data  path=16 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
Fclk/fs=15  inword=8  data  ath=16 

P 


Total  cost  for  nz  ops  per  tap  —  ram,  register,  transpose 
X  Fclk/Fs=15,  inword=8,  data  path=16 


Figure  33  Storage  Cost  Comparison 

Nz  ops  per  tap  —  ram  &  reg  mem  T otal  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk^ssio  inwords8  data  paths16  Fclk/fss10  inwordsS  data  paths15 


Cost  of  mem.  non-zero  operations  per  tap  -  for  register  architeoture  __  transpose 

Fck/fs»10inword»8data  ath»16  Ac  inc  m  ^  o  ^  4  4U  Ve 

P  X  10  Fclk/Fs=10,  inword=8,  data  path=16 


Figure  34  Storage  Cost  Comparison 
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Nz  ops  per  tap  —  ram  &  reg  mem 
Fclk/fss15  inword=8  data  path^lZ 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
Fclk/fss15  inword=8  data  ath=12 

P 


Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 
Fclk/fs=15  inword=8  data  path=12 


Total  cost  for  nz  ops  per  tap  —  ram,  register,  transpose 
Fclk/Fs=15,  inword=8,  data  path=12 


Figure  35  Storage  Cost  Comparison 


Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=10  inword=8  data  path=12  Fclk/fs=10  inword=8  data  path=12 


Cost  of  mem.  non-zero  operations  per  tap  —  for  register  architecture 
Fclk/fs=10  inword=8  data  ath=12 

P 


Total  cost  for  nz  ops  per  tap  —  ram,  register,  transpose 
Fclk/Fs=10,  inword=8,  data  path=12 


Figure  36  Storage  Cost  Comparison 
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Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=15  inword=16  data  path=40  Fclk/fs=15  inword=16  data  path=40 


Cost  of  mem.  non-zero  operations  per  tap  -  for  register  architecture  ,3  __  ^3  ,5,^  transpose 

Fclkrts=15  ,nword=16  datapath=40  ^  io^*Fs.fs,Lord=16,  date  path.40 


Figure  37  Storage  Cost  Comparison 


Nz  ops  per  tap  —  ram  &  reg  mem  Total  cost  nz  ops  per  tap  —  ram  &  reg  mem 

Fclk/fs=10  inword=16  data  path=40  Fclk/fs=10  inword=16  data  path=40 


Cost  of  mem.  non-zero  operations  per  tap  -  for  register  architecture  ,3  ..  ,3  ,3,3  fr3n3pos3 

FclWfs=10  ,nword=16  datapath=40  ^  io^clk/Fs=fo,  inword.16,  date  path-dO 


Figure  38  Storage  Cost  Comparison 


55 


8.  Performance  Optimization  for  FPGA  Architecture 

The  RAM  based  architecture  puts  some  stringent  constraints  on  the  clocking  schemes. 
The  input  samples  are  arriving  on  the  sample  clock  edges.  The  arithmetic  operations  are 
happening  on  the  system  clock  edges.  The  ratio  Fs/Fclk  (system  clock  speed/sample 
clock  speed),  specifies  the  number  of  arithmetic  (shift- add- accumulate)  operations  that 
are  possible  before  the  next  sample  arrives.  The  input  samples  are  stored  away  in  the 
RAMs.  Depending  on  the  Fs/Fclk  ratio  and  the  number  of  non- zero  bits  in  the  CSD 
representation  of  the  coefficients,  there  could  be  many  parallel  RAM/ Arithmetic  Units. 
These  are  daisy  chained.  So,  as  the  first  RAM  reads  data,  that  particular  input  sample  is 
written  to  the  next  RAM  (RAM  2). 


Figure  39  RAM  based  architecture 

However,  depending  on  the  number  of  non  zero  bits  per  coefficient  and  the  way  they  are 
organized,  it  could  very  well  turn  out  that  the  Arithmetic  Unit  (AU)  is  looking  for  a 
particular  sample  in  a  particular  RAM,  before  it  has  arrived  at  that  RAM.  For  example, 
consider  the  case  where  the  Fs/Fclk  =  4  and  the  first  coefficient  has  6  non- zero  bits.  In 
this  case,  the  input  data  needs  to  be  written  to  two  RAMs  simultaneously.  The  RAMs 
cannot  be  daisy  chained  as  above,  but  need  to  be  connected  as  below.  This  tends  to  break 
the  “clean”  daisy  chain  that  would  otherwise  be  possible,  and  also  creates  an  architecture 
that  has  very  different  implications  in  terms  of  timing  due  to  the  capacitive  loading  and 


Figure  40  Load  balancing  of  arithmetic  units 
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routing  due  to  more  fanouts.  This  can  become  a  serious  problem  if  the  same  input  fans 
out  to  3  or  more  RAMs.  DSP  Canvas,  can  automatically  figure  out  when  this  kind  of 
connection  is  necessary.  (Style  0  and  Style  1  in  the  filter  parameters  menu  control  this). 
However,  by  slightly  manipulating  the  coefficients  (manually),  the  first  architecture  can 
be  made  to  work.  For  instance,  at  a  quantization  of  8  bits,  0.002104759  has  10001001, 
but  by  slightly  changing  this  to  0.002120971,  this  has  10001011,  and  by  changing  this  to 
0.002075195,  this  has  10001000.  This  can  significantly  change  the  allocation  of 
coefficients  to  arithmetic  units  and  could  result  in  moving  from  style  1  to  style  0  (the 
clean  daisy  chain).  In  such  a  case,  the  COEFFICIENTS. d  file  can  be  manually  changed  to 
reflect  the  new  coefficient  and  VHDE  generator  rerun.  This  can  significantly  improve 
timing  and  routing  on  the  EPGA. 


File  Edit  Tools  Window  Help 
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Frequency  response  of  2  filters  with  their  first  tap  slightly  altered 
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Impulse  response  of  2  filters  with  their  first  tap  slightly  altered 


Figure  41  Filter  modifications  to  reduce  logic  complexity 
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When  the  number  of  coefficients  is  small  or  when  the  Fs/Fclk  ratio  is  small,  then  the 
regular  transpose  form  FIR  can  perform  very  well.  This  architecture  does  not  involve 
multiple  clocks  and  the  overhead  of  the  RAM  is  not  there.  However,  if  the  Fs/Fclk  ratio  is 
very  high  (>  10)  and  if  the  filter  is  a  large  filter  (>  50  taps),  the  RAM  based  architecture 
can  provide  significant  area  savings.  The  CSD  notation  acts  to  minimize  the  number  of 
non- zero  operations,  so  fewer  parallel  RAMS  are  needed. 

In  some  cases,  this  could  be  a  problem.  For  example,  if  the  nz_ops  (non  zero  bits),  it  not  a 
multiple  of  the  number  of  RAMs,  then  the  last  AU  is  rot  fully  utilized  (figure  42).  There 
are  clock  cycles  of  the  system  clock  that  need  to  be  ignored  by  the  accumulator.  When 
the  Fs/Fclk  ratio  is  high,  this  can  lead  to  significantly  more  hardware,  in  the  form  of 
comparators  and  multiplexors.  An  alternative  to  avoid  this  is  to  make  sure  the  nz_ops  are 
a  multiple  of  Fs/Fclk  ratio,  which  will  reduce  unnecessary  logic. 


Figure  42  The  last  stage  of  the  parallel  AU 
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